Comparing Testing approaches under Non-Proportional Hazards.

1. Weighted Log-rank test(WLRT).

Peto-Peto .
Modified Peto-Peto.
Tarone-ware.
Gehan-Breslow/ Wilcoxon.
Fleming-Harrington.

2. Modestly Weighted Log-rank test.

Unstratified
Stratified

3. Max Combo test.

Unstratified
Stratified

Note on the Use of Custom R Functions:

The survminer package can be used to perform the Peto–Peto, modified Peto–Peto, Tarone–Ware, and Gehan–Breslow/Wilcoxon tests. However, its functionality is limited to generating p-values and does not provide the corresponding test statistics. The custom R function from Kassambara’s GitHub repository can be used to perform these tests. This function internally implements weighted log-rank tests and supports the log-rank, Gehan–Breslow, Tarone–Ware, Peto–Peto, modified Peto–Peto, and Fleming–Harrington tests. The custom function reproduces functionality previously available through survMisc::comp(), which is no longer available on CRAN. The custom functions used to perform each test, as presented in the R document, were derived from this GitHub-based implementation. Only the sections relevant to pairwise comparisons were retained, since the original custom function also supports k-group comparisons.

Analysis	Supported in R	Supported in SAS	Match	Notes
WLRT- Peto-Peto	Yes	Yes	Yes	In R, the `survminer::surv_pvalue(method = "S1")` function computes the p-value for the Peto–Peto test. However, the `survminer` package does not provide a function for generating the corresponding test statistics. The custom R function can be used to perform this test and obtain both the p-value and chi-square statistic. The resulting values are comparable to those produced by SAS but not consistent with results from the `coin::logrank_test()` and `survival::survdiff()` functions. In SAS, this test is implemented using the `LIFETEST` procedure with a `STRATA` statement and the `TEST=peto` option.
WLRT- Modified Peto-Peto	Yes	Yes	Yes	The `survminer::surv_pvalue(method = "S2")` function in R generates the p-value for the modified Peto–Peto test. The custom R function can be used to obtain both the chi-square statistic and the p-value. In SAS, this test is performed using the `LIFETEST` procedure with a `STRATA` statement and the `TEST=modpeto` option.
WLRT-Tarone-Ware	Yes	Yes	Yes	The `coin::logrank_test()` function in R performs the Tarone–Ware test when the argument `type = "Tarone-Ware"` is specified. The `survminer::surv_pvalue(method = "sqrtN")` function computes the p-value for this test. However, because the `survminer` package does not provide a built-in function to return the test statistic, the custom R function can be used to compute the chi-square statistic along with a p-value consistent with the `method = "sqrtN"` implementation. The results obtained from the custom function and the `survminer` package agree with those produced in SAS using `TEST=TaroneWare` with the STRATA statement in PROC LIFETEST. In contrast, the results from `coin::logrank_test()` do not match the SAS output. The `survminer::surv_pvalue()` function computes p-values from `survfit` objects by comparing survival curves. Its default method is `survdiff`, which performs the standard log-rank test.
WLRTGehan Breslow	Yes	Yes	Yes	In R, `survminer::surv_pvalue(method = "n")` together with the custom R function yields the p-value for the Gehan–Breslow test, corresponding to its canonical weighting scheme. The `coin::logrank_test()` function performs this test when the argument `type = "Gehan-Breslow"` is specified. In SAS, the equivalent test is obtained by specifying `TEST=Wilcoxon` in the `PROC LIFETEST` procedure. However, the results produced by `coin::logrank_test()` are not consistent with the SAS output.
WLRT-Fleming- Harrington	Yes	Yes	Yes	This test is computed in R using `nphRCT::wlrt()` and `coin::logrank_test` functions. In SAS, you have to specify `test=FH(`\(\rho,\gamma\)`)` using the `LIFETEST` procedure. The results produced by `nphRCT::wlrt()` and the `LIFETEST` procedure are consistent.
Max Combo	Yes	Yes	Yes	In R, the test can be implemented using `nph::logrank.maxtest()`, which defaults to a two-sided test. There is no built-in functionality in the SAS `LIFETEST` procedure to perform this test directly; it must be implemented via this SAS macro. Results from the R implementation and the SAS macro are similar. A stratified max-combo test can be performed in both R and SAS. Additionally, the choice of weighting can be modified, as demonstrated in the SAS file.
Modestly Weighted Log-rank	Yes	No	No	In R, this test can be performed using the `nphRCT::wlrt()` function and specifying either the `t` or `s` parameter. Here, `s` represents the fixed survival probability threshold, whereas `t` denotes the time point at which the pooled survival probability reaches `s*` (see the referenced documentation for the definition of this test’s weight function). A stratified version of the test can be implemented by incorporating the `strata()` function. This approach provides both the individual test statistics for each stratum and the combined test statistic. To the knowledge of the CAMIS contributors, there is no direct implementation of this test in the SAS `LIFETEST` procedure.

Comparison Results.

\[H_0 : S_1(t)=S_2(t) \mbox{ }\forall t \mbox{ v/s } H_1 : S_1(t) \neq S_2(t) \mbox{ for some t. }\]

Note: coin::logrank_test() - Default distribution is asymptotic. Generates \(Z\) test statistic. \(Z^2 = \chi^2_{(1)}\)

Test	Statistic	Function in R	R Result	Function in SAS	SAS Result	Match	Notes
WLRT- Peto-Peto	Chi-square	`Custom R function` `coin::logrank_test()-`Asymptotic `survival::survdiff()`	9.8238 Z=3.0423 9.9000	`PROC LIFETEST` with `STRATA group /test=peto)`	9.8238	Yes No No	`survminer::surv_pvalue()` prints the p-value; The custom R function generates both the chi-square statistics and the corresponding p-value.
	P-Value	`survminer::surv_pvalue()` `Custom R function` `coin::logrank_test()-`Asymptotic `survival::survdiff()`	0.0017 0.0017 0.0023 0.0020	`PROC LIFETEST` with `STRATA group /test=peto`	0.0017	Yes Yes No No No
WLRT- Modified Peto	Chi-square	`Custom R function`	9.7491 Z= 3.0276	`PROC LIFETEST` with `STRATA group /test=modpeto`	9.7491	Yes No
	P-Value	\|`survimer::surv_pvalue()` `Custom R function` `coin::logrank_test()`	0.0018 0.0018 0.002465	`PROC LIFETEST` with `STRATA group /test=modpeto`	0.0018	Yes Yes No
WLRT- Tarone-Ware	Chi-square	`Custom R function` `coin::logrank_test()`	9.4230 Z=2.9636	`PROC LIFETEST` with `STRATA group /test=taroneware`	9.4230	Yes No
	P-Value	\|`survimer::surv_pvalue()` `Custom R function.` `coin::logrank_test()`	0.0021 0.0021 0.0030	`PROC LIFETEST` with `STRATA group /test=taroneware`	0.0021	Yes Yes No
WLRT- Gehan-Breslow/ Wilcoxon	Chi-square	`Custom R function` `coin::logrank_test()`	8.2593 Z=2.7863	`PROC LIFETEST` with `STRATA group/test=wilcoxon`	8.2593	Yes No
	P-Value	\|`survimer::surv_pvalue()` `Custom R function` `coin::logrank_test()`	0.0041 0.0041 0.0053	`PROC LIFETEST` with `STRATA group/test=wilcoxon`	0.0041	Yes Yes No
Fleming- Harrington	Chi-square	`nphRCT::wlrt()`	FH(0.5,0.5)=10.3122 FH(1,1)=9.8019 FH(0,1)=9.5455 FH(0.5,2)=8.32428 FH(1,0)=9.9	`PROC LIFETEST` with `STRATA group/test=FH()`	FH(0.5,0.5)=10.3122 FH(1,1)=9.8019 FH(0,1)=9.5455 FH(0.5,2)=8.32428 FH(1,0)=9.9	Yes
		`coin::logrank_test()`	FH(0.5,0.5) :Z=3.0582 FH(1,1): Z=2.9720 FH(0,1): Z=2.9256 FH(0.5,2): Z=2.7163 FH(1,0): Z=3.0423			No
	P-Value	`nphRCT::wlrt()`	FH(0.5,0.5)=0.0013 FH(1,1)=0.0017 FH(0,1)=0.0020 FH(0.5,2)=0.0041 FH(1,0)=0.0017	`PROC LIFETEST` with `STRATA group/test=FH()`	FH(0.5,0.5)=0.0013 FH(1,1)=0.0017 FH(0,1)=0.0020 FH(0.5,2)=0.0041 FH(1,0)=0.0017	Yes
		`coin::logrank_test()`	FH(0.5,0.5)=0.0022 FH(1,1)=0.0030 FH(0,1)=0.0034 FH(0.5,2)=0.0066 FH(1,0)=0.0023			No
Modestly Weighted Log-rank(unstratified)	Chi-square	`nphRCT::wlrt()`	11.2786	`No direct implementation`	Null	No	This function can perform types of modestly-weighted log-rank tests and the Fleming-Harrington(\(\rho,\gamma\)) test, in addition to the standard log-rank test.
	P-Value	`nphRCT::wlrt()`	0.0008	`No direct implementation`	Null	No
stratified Modestly Weighted Logrank_test	Chi-square	`nphRCT::wlrt()`	strata1=7.5185 strata2=3.7418 Combined=10.8359	`No direct implementation`	Null	No
	P-Value	`nphRCT::wlrt()`	strata1=0.0061 strata2=0.0531 Combined=0.0010	`No direct implementation`	Null	No
Unstratified -Max Combo	Chi-square	`nph::logrank.maxtest()`	3.30 (Z test)	`SAS macro`	3.30152	Yes	In R, it defaults to two sided test, unless specified otherwise.
	P-Value	`nph::logrank.maxtest()`	0.00196\(\approx\) 0.0020 Bonferroni adjusted p-value=0.00385	`SAS macro`	0.0020 The `SAS macro` does not perform Bonferroni adjustment.	Yes	In R, `logrank.maxtest()`, the algorithm `mvtnorm::GenzBretz()` calls the Genz–Bretz algorithm, a quasi-Monte Carlo method, which introduces randomness in each run; this generates slightly varying values for the unadjusted p-value in each run. Therefore, there is a need to use `set.seed()` for reproducibility. In the `SAS macro`, a Monte-Carlo simulation `x = RANDNORMAL(n, mean,corr2)` generates 5,000,000 samples from a k-dimensional multivariate normal distribution and a fixed seed (1).
Stratified -Max Combo	Chi-square	`strata.MaxCombo::SMCtest()`	Z1=3.1813 Z2=3.1813 Z3=3.2738	`SAS macro`	3.18132	Yes Yes No	In R, the test outputs multiple p-values corresponding to different covariance estimators; The `SAS macro` generates a combination test with a single `Z max` and `p` for p-value. The first `pval` and `z.max` is closer to those of `SAS Macro.`
	p-Value	`strata.MaxCombo::SMCtest()`	p1=0.0030 p2=0.0026 p3=0.0021	`SAS macro`	0.0034	The closest. slight difference No No	The `SAS macro` was modified to accommodate a stratifying variable. The adjustments are documented in the SAS document.

Summary and Recommendation.

Testing combinations of Weighted Log-rank statistics is a robust alternative to Weighted Log-rank for detecting differences in survival curves in non-proportional hazard situations. However, some authors have expressed caution about the use of the combination test in the sense that one risk is identifying statistically significant results with clinical insignificance; for instance, in cases where treatment is uniformly worse than control, Max Combo can still offer a high chance of rejecting the null hypothesis, favouring treatment. Magirr & Burman developed Modestly weighted Log-rank to counter these issues, especially for a delayed effect case; the weighting is controlled such that the worse treatment effect is not rewarded at an early time point.

The Wilcoxon test reported in SAS documentation corresponds to the Gehan-Breslow test in R. For Peto Peto, Modified Peto, Gehan-Breslow/Wilcoxon and Tarone-Ware test, to ensure reproducibility with SAS procedures, |survimer::surv_pvalue() and a Custom R function can be used. |survimer::surv_pvalue() and custom R compute these test statistics based on the weighting definition of these tests, for example: these functions utilise the size of the risk set and the square root of the risk set, respectively, to compute weights for Gehan-Breslow/Wilcoxon and Tarone-Ware, respectively. The coin::logrank_test() and |survival::survdiff() implement these tests differently, as discussed earlier. coin provides an implementation of a general framework for conditional inference procedures commonly known as permutation tests. |survival::survdiff() in R uses hypergeometric variance formulation to implement Mantel-Cox log-rank test . It uses \(G^\rho\) family of tests .

References.

wlrt() documentation: https://search.r-project.org/CRAN/refmans/nphRCT/html/wlrt.html
survdiff() documentation: https://www.rdocumentation.org/packages/survival/versions/3.8-3/topics/survdiff
survminer() documentation: https://cran.r-project.org/web/packages/survminer/survminer.pdf
logrank.maxtest() documentation: https://search.r-project.org/CRAN/refmans/nph/html/logrank.maxtest.html
Robust modestly weighted log-rank testsdocumentation:https://arxiv.org/html/2412.14942v1
nphRCT package documentation: https://cran.r-project.org/web/packages/nphRCT/nphRCT.pdf
LIFETEST procedure documentation: https://documentation.sas.com/doc/en/statug/15.2/statug_lifetest_syntax01.htm
Combination weighted log-rank tests documentation: https://support.sas.com/resources/papers/proceedings20/5062-2020.pdf
LIFETEST procedure documentation: https://support.sas.com/documentation//cdl/en/statug/68162/HTML/default/viewer.htm#statug_lifetest_details16.htm
Stratified modestly weighted log-rank test documentation: https://cran.r-project.org/web/packages/nphRCT/vignettes/weighted_log_rank_tests.html
Stratified Max-Combo documentation: https://cran.r-project.org/web/packages/strata.MaxCombo/strata.MaxCombo.pdf
Max-combo documentation: https://search.r-project.org/CRAN/refmans/nph/html/logrank.maxtest.html